fix(afdocs): add llms.txt directive to OpenAPI mirrors; drop mcp.json from llms.txt#6
Merged
fix(afdocs): add llms.txt directive to OpenAPI mirrors; drop mcp.json from llms.txt#6
Conversation
… from llms.txt Clears two AFDocs warnings flagged by `npx afdocs check https://docs.atomicmemory.ai` against PR #5 (round-2): ## llms-txt-directive-md (was 28/49 pages, 21 missing) OpenAPI page mirrors are rendered from the vendored OpenAPI YAML in `scripts/mirror-markdown.mjs`, not from rendered HTML, so the remark plugin's directive injection never reached them. Their existing "Machine-readable: this page mirrors operation X" blockquote also didn't satisfy the AFDocs spec, which requires a blockquote that links to llms.txt. Both `renderOpenapiOperation` and `renderOpenapiInfo` now emit `> Agent index: [llms.txt](/llms.txt)` as their first blockquote (matching what the remark plugin injects on hand-authored pages), and move the operation/info source-of-truth note to a regular paragraph below. ## llms-txt-directive-html (was 49/50 pages, 1 missing) The "missing" page was `/.well-known/mcp.json` — a JSON file that AFDocs followed because llms.txt linked it as a documentation entry. JSON files don't have an HTML directive blockquote, so listing it as a doc link was a false positive against the directive check. `build-llms-txt.mjs` no longer emits `.well-known/mcp.json` as a markdown link in the Optional section. Discovery for the MCP descriptor is via the well-known path itself; agents that care will look at `/.well-known/mcp.json` directly. A short non-link line in llms.txt still tells humans/agents the descriptor exists. ## Out of scope - content-start-position (16 of 50 pages past 50%, all OpenAPI ref pages): the DOM has `<aside>` (sidebar) before `<article>`, and for short OpenAPI operation pages the sidebar text outweighs the article body — pushing the first content element past 50% in the HTML→text conversion. Real fix is a Docusaurus theme swizzle to reorder DOM (CSS keeps visual layout). Not in this PR. - markdown-content-parity (~9 of 49 pages, avg 6% missing, improvement from PR #5's avg 13%): residual gap is mostly whitespace tokenization differences in code blocks; the spec threshold is 5% pass / 20% warn / >20% fail. Real wins from here require either turndown rule tuning or `data-markdown-ignore` annotations on theme elements. Not in this PR. - content-negotiation (FAIL): GH Pages cannot honor `Accept: text/markdown`. Tracked as F1 (hosting migration).
ethanj
added a commit
that referenced
this pull request
May 3, 2026
…in mirror Three blockers caught in PR review against the AFDocs work merged in PR #5. Each lands with a focused fix. ## Blocker 1: AFDocs runtime deps removed from package.json `6a97c0b` dropped `cheerio`, `turndown`, `turndown-plugin-gfm`, `js-yaml`, `@types/js-yaml`, `@types/turndown` from devDependencies. Those are required by `scripts/mirror-markdown.mjs`, so `npm run build` fails with `Cannot find module 'turndown'` from the `llms-and-mirror-plugin.mjs` postBuild path. Restored all six in devDependencies (matching origin/main's pinned versions). ## Blocker 2: prebuild/prestart auto-regen reintroduced `6a97c0b` re-added `prestart`/`prebuild` hooks that run `write:docs-mode && regen:api`. PR #5's commit `08c5312` explicitly removed `regen:api` from those hooks because it dirtied the 29 committed `.api.mdx` files on every build (non-deterministic re-encoding of the compressed `api:` blob). Verified the regression by running `npm run build` on this branch and seeing 6 .api.mdx files modified in the worktree afterward. Kept `write:docs-mode` (it's needed for the new docs-mode flag) but dropped the `regen:api` chain. Spec refresh stays the explicit `vendor:spec → regen:api → commit` flow that `scripts/vendor-core-spec.mjs:14` already documents. ## Blocker 3: custom Heading swizzle leaked into the markdown mirror The new `<Heading>` component (`src/theme/Heading/index.tsx`) wraps each heading's text in an anchor link with a `#` icon span — a copy- to-clipboard affordance for humans. The wrapping anchor uses CSS-module class names (`headingLink_no4V`, `headingIcon_Pk3T`) which the mirror's noise selector strips list (which targets the default `a.hash-link`) doesn't match. Result: every heading in every mirror rendered as # [#Observability](#observability "Copy link to Observability") instead of # Observability Two-part fix: 1. `src/theme/Heading/index.tsx` — add `data-markdown-ignore` to the `#` icon span. AFDocs treats this attribute as the spec-compliant marker for HTML-only content that should not appear in markdown parity comparison; tooling that converts the page to markdown should also strip it. 2. `scripts/mirror-markdown.mjs` — - Add `[data-markdown-ignore]` to `ARTICLE_NOISE_SELECTORS` so the icon span is removed before turndown runs (defense for other tooling that might emit the same attribute). - Add a `clean-heading-text` turndown rule that intercepts `<h1>`–`<h6>` and emits clean `# Title` from `node.textContent` (with leading `#` chars stripped). This works for both default Docusaurus headings and the swizzled component, so the mirror no longer carries the wrapping anchor link as part of the heading text. Verified: `npm run build` produces clean `# Observability` / `## The summary shapes` in `build/platform/observability.md`. ## Verification - `npm run typecheck`: pass - `npm run build`: pass; worktree has 0 .api.mdx files dirtied (down from 6 before this commit) - 7/7 platform pages + 31/31 API reference pages have the llms.txt directive in HTML - All four AFDocs artifacts present: `llms.txt`, `llms-full.txt`, `skill.md`, `.well-known/mcp.json` - Sample mirror (`build/platform/observability.md`) shows clean ATX headings with no anchor-link leak ## Note for the round-3 PR PR #6 also touches `scripts/mirror-markdown.mjs` and `scripts/build-llms-txt.mjs`. The two PRs don't overlap on the same lines, but whichever lands second will need a small rebase.
ethanj
added a commit
that referenced
this pull request
May 3, 2026
…in mirror Three blockers caught in PR review against the AFDocs work merged in PR #5. Each lands with a focused fix. ## Blocker 1: AFDocs runtime deps removed from package.json `6a97c0b` dropped `cheerio`, `turndown`, `turndown-plugin-gfm`, `js-yaml`, `@types/js-yaml`, `@types/turndown` from devDependencies. Those are required by `scripts/mirror-markdown.mjs`, so `npm run build` fails with `Cannot find module 'turndown'` from the `llms-and-mirror-plugin.mjs` postBuild path. Restored all six in devDependencies (matching origin/main's pinned versions). ## Blocker 2: prebuild/prestart auto-regen reintroduced `6a97c0b` re-added `prestart`/`prebuild` hooks that run `write:docs-mode && regen:api`. PR #5's commit `08c5312` explicitly removed `regen:api` from those hooks because it dirtied the 29 committed `.api.mdx` files on every build (non-deterministic re-encoding of the compressed `api:` blob). Verified the regression by running `npm run build` on this branch and seeing 6 .api.mdx files modified in the worktree afterward. Kept `write:docs-mode` (it's needed for the new docs-mode flag) but dropped the `regen:api` chain. Spec refresh stays the explicit `vendor:spec → regen:api → commit` flow that `scripts/vendor-core-spec.mjs:14` already documents. ## Blocker 3: custom Heading swizzle leaked into the markdown mirror The new `<Heading>` component (`src/theme/Heading/index.tsx`) wraps each heading's text in an anchor link with a `#` icon span — a copy- to-clipboard affordance for humans. The wrapping anchor uses CSS-module class names (`headingLink_no4V`, `headingIcon_Pk3T`) which the mirror's noise selector strips list (which targets the default `a.hash-link`) doesn't match. Result: every heading in every mirror rendered as # [#Observability](#observability "Copy link to Observability") instead of # Observability Two-part fix: 1. `src/theme/Heading/index.tsx` — add `data-markdown-ignore` to the `#` icon span. AFDocs treats this attribute as the spec-compliant marker for HTML-only content that should not appear in markdown parity comparison; tooling that converts the page to markdown should also strip it. 2. `scripts/mirror-markdown.mjs` — - Add `[data-markdown-ignore]` to `ARTICLE_NOISE_SELECTORS` so the icon span is removed before turndown runs (defense for other tooling that might emit the same attribute). - Add a `clean-heading-text` turndown rule that intercepts `<h1>`–`<h6>` and emits clean `# Title` from `node.textContent` (with leading `#` chars stripped). This works for both default Docusaurus headings and the swizzled component, so the mirror no longer carries the wrapping anchor link as part of the heading text. Verified: `npm run build` produces clean `# Observability` / `## The summary shapes` in `build/platform/observability.md`. ## Verification - `npm run typecheck`: pass - `npm run build`: pass; worktree has 0 .api.mdx files dirtied (down from 6 before this commit) - 7/7 platform pages + 31/31 API reference pages have the llms.txt directive in HTML - All four AFDocs artifacts present: `llms.txt`, `llms-full.txt`, `skill.md`, `.well-known/mcp.json` - Sample mirror (`build/platform/observability.md`) shows clean ATX headings with no anchor-link leak ## Note for the round-3 PR PR #6 also touches `scripts/mirror-markdown.mjs` and `scripts/build-llms-txt.mjs`. The two PRs don't overlap on the same lines, but whichever lands second will need a small rebase.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Round 3 of AFDocs fixes — directive coverage
Overview
Real AFDocs scorecard against the live site (after PR #5) shows 93/100 (A) with 4 remaining issues. This PR clears the 2 directive-coverage warnings — quick wins that close low-hanging gaps. The other 2 (content-start-position, markdown-content-parity residual, content-negotiation) need deeper work and are explicitly out of scope below.
llms-txt-directive-mdllms-txt-directive-htmlcontent-start-positionmarkdown-content-paritycontent-negotiationChanges
Add directive to OpenAPI mirrors
scripts/mirror-markdown.mjs—renderOpenapiOperationandrenderOpenapiInfonow emit> Agent index: [llms.txt](/llms.txt)as the first blockquote (same wording the remark plugin injects on hand-authored pages). The "this page mirrors operation X" note moves to a regular paragraph below.OpenAPI .md mirrors render from the vendored YAML, so the remark plugin's directive injection (which runs on MDX) never reached them. ~30 API operation pages plus the rolled-up info page were missing the directive in their .md form.
Drop
.well-known/mcp.jsonfrom llms.txtscripts/build-llms-txt.mjs— the "Optional" section listed.well-known/mcp.jsonas a markdown link, so AFDocs walked it as a doc page and flagged it for missing the HTML directive. Now a non-link prose line in llms.txt mentions the descriptor exists; agents that care will look at the well-known path directly.Out of scope
content-start-position(16/50, worst 69%)Almost entirely OpenAPI reference pages. Diagnosis:
For short operation pages (e.g.
GET /v1/memories/{id}) the sidebar's text content outweighs the article body, so the first meaningful content element lands past 50% in the converted text.Fix requires a Docusaurus theme swizzle of the docs layout to render
<article>before<aside>in DOM, with CSS preserving visual sidebar-on-left layout. That's a separate, larger change touching theme components.markdown-content-parity(9/49, avg 6%, max 42%)PR #5 already brought avg 13% → 6% by switching to HTML→md rendering with cheerio + turndown + GFM. The residual gap is dominated by whitespace tokenization in code blocks (turndown joins Prism
.token-linechildren with\nbut inline tokens don't always have spaces between them, so word-level diff over-counts gaps).Further wins from here need either turndown rule tuning per code-language, or theme-level
data-markdown-ignoreattributes on elements that exist only in HTML. Bigger investment than the round-3 directive fixes.content-negotiation(FAIL)GH Pages cannot honor
Accept: text/markdown. F1 follow-up: migrate hosting to Cloudflare Pages / Vercel / Netlify with an edge function for the rewrite.Verification
npm run buildcleannpm run typecheckclean.mdmirrors now lead with> Agent index: [llms.txt](/llms.txt).well-known/mcp.json🤖 Generated with Claude Code